Student: SMIT, GIJS (0905883)
Loading model from file Success! Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= reshape_1 (Reshape) (None, 3072) 0 _________________________________________________________________ dense_2 (Dense) (None, 1152) 3540096 _________________________________________________________________ dense_3 (Dense) (None, 576) 664128 _________________________________________________________________ dense_4 (Dense) (None, 288) 166176 _________________________________________________________________ dense_5 (Dense) (None, 144) 41616 _________________________________________________________________ dense_6 (Dense) (None, 10) 1450 ================================================================= Total params: 4,413,466 Trainable params: 4,413,466 Non-trainable params: 0 _________________________________________________________________ None loss accuracy val_loss val_accuracy min 0.380644 0.384159 0.557180 0.527979 max 1.839650 0.886520 1.477031 0.845093 Our network consists of 4 layers where each subsequent layer has half as many nodes: 1152-576-288-144. We experimented a lot to find an optimal design: a too narrow layer will drop info and a too wide layer will overfit. The network was able to achieve a decent validation accuracy of 84.5% without overfitting too much. We used Adam as optimizer as it performed better and converged the fastest. Adam worked best in combination with a learning rate of 1.25e-5, a batch size of 12, and 40 epochs.
Loading model from file Success! Model: "sequential_2" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= reshape_2 (Reshape) (None, 1024) 0 _________________________________________________________________ dense_7 (Dense) (None, 1152) 1180800 _________________________________________________________________ dense_8 (Dense) (None, 576) 664128 _________________________________________________________________ dense_9 (Dense) (None, 288) 166176 _________________________________________________________________ dense_10 (Dense) (None, 144) 41616 _________________________________________________________________ dense_11 (Dense) (None, 10) 1450 ================================================================= Total params: 2,054,170 Trainable params: 2,054,170 Non-trainable params: 0 _________________________________________________________________ None loss accuracy val_loss val_accuracy min 0.211858 0.480879 0.499776 0.633474 max 1.600264 0.940136 1.190097 0.856864 We converted the images to grayscale and increased the contrast to make the digits more distinguishable. Applying standardization (zero mean and unit variance) resulted in significantly more overfitting so we left it out. The same model from 1.1 now achieved a validation accuracy of 85.7% which is 1.2% better. The overfitting is now much larger compared to 1.1. We conclude that dense networks cannot distinguish different color channels so they can learn better features from grayscale images.
Loading model from file Success! Model: "sequential_7" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= reshape_5 (Reshape) (None, 1024) 0 _________________________________________________________________ block1_dense (Dense) (None, 1152) 1180800 _________________________________________________________________ block1_batchnorm (BatchNorma (None, 1152) 4608 _________________________________________________________________ block1_dropout (Dropout) (None, 1152) 0 _________________________________________________________________ block2_dense (Dense) (None, 576) 664128 _________________________________________________________________ block2_batchnorm (BatchNorma (None, 576) 2304 _________________________________________________________________ block2_dropout (Dropout) (None, 576) 0 _________________________________________________________________ block3_dense (Dense) (None, 288) 166176 _________________________________________________________________ block3_batchnorm (BatchNorma (None, 288) 1152 _________________________________________________________________ block3_dropout (Dropout) (None, 288) 0 _________________________________________________________________ block4_dense (Dense) (None, 144) 41616 _________________________________________________________________ block4_batchnorm (BatchNorma (None, 144) 576 _________________________________________________________________ block4_dropout (Dropout) (None, 144) 0 _________________________________________________________________ block5_fc (Dense) (None, 10) 1450 ================================================================= Total params: 2,062,810 Trainable params: 2,058,490 Non-trainable params: 4,320 _________________________________________________________________ None loss accuracy val_loss val_accuracy min 0.347027 0.602071 0.377585 0.623655 max 1.235725 0.888093 1.173410 0.888777 We regularized the model by adding batchnorm and dropout after each dense layer. L1 and L2 regularization gave no noticable improvements in combination with batchnorm and dropout so we left L1 and L2 out. The regularized model achieved a validation accuracy of 88.7%, which is a 3.0% improvement over 1.2. Also, the model is not overfitting anymore. Regularization has decreased the gap between validation and train accuracy and it has improved the overall performance of the dense network.
Loading model from file Success! Model: "sequential_4" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= block1_conv (Conv2D) (None, 32, 32, 64) 640 _________________________________________________________________ block1_batchnorm (BatchNorma (None, 32, 32, 64) 256 _________________________________________________________________ block1_dropout (Dropout) (None, 32, 32, 64) 0 _________________________________________________________________ block2_conv (Conv2D) (None, 32, 32, 128) 73856 _________________________________________________________________ block2_batchnorm (BatchNorma (None, 32, 32, 128) 512 _________________________________________________________________ block2_dropout (Dropout) (None, 32, 32, 128) 0 _________________________________________________________________ block3_conv (Conv2D) (None, 32, 32, 128) 147584 _________________________________________________________________ block3_batchnorm (BatchNorma (None, 32, 32, 128) 512 _________________________________________________________________ block3_dropout (Dropout) (None, 32, 32, 128) 0 _________________________________________________________________ block4_conv (Conv2D) (None, 32, 32, 128) 147584 _________________________________________________________________ block4_batchnorm (BatchNorma (None, 32, 32, 128) 512 _________________________________________________________________ block4_pooling (MaxPooling2D (None, 16, 16, 128) 0 _________________________________________________________________ block4_dropout (Dropout) (None, 16, 16, 128) 0 _________________________________________________________________ block5_conv (Conv2D) (None, 16, 16, 128) 147584 _________________________________________________________________ block5_batchnorm (BatchNorma (None, 16, 16, 128) 512 _________________________________________________________________ block5_dropout (Dropout) (None, 16, 16, 128) 0 _________________________________________________________________ block6_conv (Conv2D) (None, 16, 16, 128) 147584 _________________________________________________________________ block6_batchnorm (BatchNorma (None, 16, 16, 128) 512 _________________________________________________________________ block6_dropout (Dropout) (None, 16, 16, 128) 0 _________________________________________________________________ block7_conv (Conv2D) (None, 16, 16, 256) 295168 _________________________________________________________________ block7_batchnorm (BatchNorma (None, 16, 16, 256) 1024 _________________________________________________________________ block7_pooling (MaxPooling2D (None, 8, 8, 256) 0 _________________________________________________________________ block7_dropout (Dropout) (None, 8, 8, 256) 0 _________________________________________________________________ block8_conv (Conv2D) (None, 8, 8, 256) 590080 _________________________________________________________________ block8_batchnorm (BatchNorma (None, 8, 8, 256) 1024 _________________________________________________________________ block8_dropout (Dropout) (None, 8, 8, 256) 0 _________________________________________________________________ block9_conv (Conv2D) (None, 8, 8, 256) 590080 _________________________________________________________________ block9_batchnorm (BatchNorma (None, 8, 8, 256) 1024 _________________________________________________________________ block9_pooling (MaxPooling2D (None, 4, 4, 256) 0 _________________________________________________________________ block9_dropout (Dropout) (None, 4, 4, 256) 0 _________________________________________________________________ block10_conv (Conv2D) (None, 4, 4, 512) 1180160 _________________________________________________________________ block10_batchnorm (BatchNorm (None, 4, 4, 512) 2048 _________________________________________________________________ block10_dropout (Dropout) (None, 4, 4, 512) 0 _________________________________________________________________ block11_conv (Conv2D) (None, 4, 4, 2048) 1050624 _________________________________________________________________ block11_dropout (Dropout) (None, 4, 4, 2048) 0 _________________________________________________________________ block12_conv (Conv2D) (None, 4, 4, 256) 524544 _________________________________________________________________ block12_pooling (MaxPooling2 (None, 2, 2, 256) 0 _________________________________________________________________ block12_dropout (Dropout) (None, 2, 2, 256) 0 _________________________________________________________________ block13_conv (Conv2D) (None, 2, 2, 256) 590080 _________________________________________________________________ block13_pooling (MaxPooling2 (None, 1, 1, 256) 0 _________________________________________________________________ block13_dropout (Dropout) (None, 1, 1, 256) 0 _________________________________________________________________ block14_flatten (Flatten) (None, 256) 0 _________________________________________________________________ block14_fc (Dense) (None, 10) 2570 ================================================================= Total params: 5,496,074 Trainable params: 5,492,106 Non-trainable params: 3,968 _________________________________________________________________ None loss accuracy val_loss val_accuracy min 0.098163 0.173848 0.183210 0.202430 max 2.351455 0.970965 2.369631 0.956631 We implemented a network inspired by the SimpleNet architecture (HasanPour et al., 2016). SimpleNet is a relatively simple architecture that can outperform deeper and more complex architectures (e.g. on CIFAR-10) and it has a good tradeoff between the computation efficiency and accuracy. SimpleNet consists of typical building blocks that contains a conv, batchnorm, and dropout layer. We adjusted the ordering in each block such that batchnorm was applied after each conv layer (instead of before). We trained the model for 25 epochs using Adam as optimzer. The trained model is slightly overfitting. This is done on purpose so that the model is still able to learn more by using data augmentation. The model achieved a validation accuracy of 95.7% which is decent for such a small network.
Loading model from file Success! Model: "sequential_9" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= block1_conv (Conv2D) (None, 32, 32, 64) 640 _________________________________________________________________ block1_batchnorm (BatchNorma (None, 32, 32, 64) 256 _________________________________________________________________ block1_dropout (Dropout) (None, 32, 32, 64) 0 _________________________________________________________________ block2_conv (Conv2D) (None, 32, 32, 128) 73856 _________________________________________________________________ block2_batchnorm (BatchNorma (None, 32, 32, 128) 512 _________________________________________________________________ block2_dropout (Dropout) (None, 32, 32, 128) 0 _________________________________________________________________ block3_conv (Conv2D) (None, 32, 32, 128) 147584 _________________________________________________________________ block3_batchnorm (BatchNorma (None, 32, 32, 128) 512 _________________________________________________________________ block3_dropout (Dropout) (None, 32, 32, 128) 0 _________________________________________________________________ block4_conv (Conv2D) (None, 32, 32, 128) 147584 _________________________________________________________________ block4_batchnorm (BatchNorma (None, 32, 32, 128) 512 _________________________________________________________________ block4_pooling (MaxPooling2D (None, 16, 16, 128) 0 _________________________________________________________________ block4_dropout (Dropout) (None, 16, 16, 128) 0 _________________________________________________________________ block5_conv (Conv2D) (None, 16, 16, 128) 147584 _________________________________________________________________ block5_batchnorm (BatchNorma (None, 16, 16, 128) 512 _________________________________________________________________ block5_dropout (Dropout) (None, 16, 16, 128) 0 _________________________________________________________________ block6_conv (Conv2D) (None, 16, 16, 128) 147584 _________________________________________________________________ block6_batchnorm (BatchNorma (None, 16, 16, 128) 512 _________________________________________________________________ block6_dropout (Dropout) (None, 16, 16, 128) 0 _________________________________________________________________ block7_conv (Conv2D) (None, 16, 16, 256) 295168 _________________________________________________________________ block7_batchnorm (BatchNorma (None, 16, 16, 256) 1024 _________________________________________________________________ block7_pooling (MaxPooling2D (None, 8, 8, 256) 0 _________________________________________________________________ block7_dropout (Dropout) (None, 8, 8, 256) 0 _________________________________________________________________ block8_conv (Conv2D) (None, 8, 8, 256) 590080 _________________________________________________________________ block8_batchnorm (BatchNorma (None, 8, 8, 256) 1024 _________________________________________________________________ block8_dropout (Dropout) (None, 8, 8, 256) 0 _________________________________________________________________ block9_conv (Conv2D) (None, 8, 8, 256) 590080 _________________________________________________________________ block9_batchnorm (BatchNorma (None, 8, 8, 256) 1024 _________________________________________________________________ block9_pooling (MaxPooling2D (None, 4, 4, 256) 0 _________________________________________________________________ block9_dropout (Dropout) (None, 4, 4, 256) 0 _________________________________________________________________ block10_conv (Conv2D) (None, 4, 4, 512) 1180160 _________________________________________________________________ block10_batchnorm (BatchNorm (None, 4, 4, 512) 2048 _________________________________________________________________ block10_dropout (Dropout) (None, 4, 4, 512) 0 _________________________________________________________________ block11_conv (Conv2D) (None, 4, 4, 2048) 1050624 _________________________________________________________________ block11_dropout (Dropout) (None, 4, 4, 2048) 0 _________________________________________________________________ block12_conv (Conv2D) (None, 4, 4, 256) 524544 _________________________________________________________________ block12_pooling (MaxPooling2 (None, 2, 2, 256) 0 _________________________________________________________________ block12_dropout (Dropout) (None, 2, 2, 256) 0 _________________________________________________________________ block13_conv (Conv2D) (None, 2, 2, 256) 590080 _________________________________________________________________ block13_pooling (MaxPooling2 (None, 1, 1, 256) 0 _________________________________________________________________ block13_dropout (Dropout) (None, 1, 1, 256) 0 _________________________________________________________________ block14_flatten (Flatten) (None, 256) 0 _________________________________________________________________ block14_fc (Dense) (None, 10) 2570 ================================================================= Total params: 5,496,074 Trainable params: 5,492,106 Non-trainable params: 3,968 _________________________________________________________________ None loss accuracy val_loss val_accuracy min 0.102976 0.187775 0.141312 0.317052 max 2.346148 0.969360 1.956509 0.965632 The augmentations we implemented are small rotations, heigh shifts, channel shifts, and shears. Larger rotations and shifts would destroy too much information. We did not implement width shifts as some images contain multiple digits located next to the center digit. As the images are grayscale we also implemented a custom augmentation function that inverses the black and white values in images. Using the augmentations we achieved a validation accuracy of 96.6% which is an 0.9% improvement.
The accuracy of the model on the test data is 96.4%. It is possible to get a small accuracy boost applying the same augmentations from training on the test set and averaging multiple predictions. This way we increased the accuracy to 96.6% which is an 0.2% improvement. Note that this method does not causes any data leakage because we only use the test images, not the labels. This is quite a good accuracy compared to state-of-the-art benchmark that typically reach between 96% to 99% accuracy. According to the confusion matrix the classes 1, 2 and 7 are often confused and also digits 3, 5, and 9. These digits are quite similar in appearence. We plotted misclassifications of class 2. It appears that most of the errors are made on images that are unclear, noisy, or have multiple digits.
We plotted the activations of 8 convolutional layers located at different depths in the network. These 8 plots give an idea of the types of features that get extracted from the input image. In the first few layers these features are quite understandable such as edges and shapes. But deeper down the network the features become more abstract. Note that the model also has two conv layers with 1x1 filters. These filters were not very interesting to show as they would be shown as single squares.
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Not-trainable: input_2 Not-trainable: block1_conv1 Not-trainable: block1_conv2 Not-trainable: block1_pool Not-trainable: block2_conv1 Not-trainable: block2_conv2 Not-trainable: block2_pool Trainable: block3_conv1 Trainable: block3_conv2 Trainable: block3_conv3 Trainable: block3_pool Trainable: block4_conv1 Trainable: block4_conv2 Trainable: block4_conv3 Trainable: block4_pool Trainable: block5_conv1 Trainable: block5_conv2 Trainable: block5_conv3 Trainable: block5_pool Trainable: block6_flatten Trainable: block6_dropout1 Trainable: block6_fc1 Trainable: block6_dropout2 Trainable: block6_fc2 Loading model from file Success! Model: "model_5" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_5 (InputLayer) [(None, 32, 32, 3)] 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 32, 32, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 32, 32, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 16, 16, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 16, 16, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 16, 16, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 8, 8, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 8, 8, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 8, 8, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 8, 8, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 4, 4, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 4, 4, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 4, 4, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 4, 4, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 2, 2, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 2, 2, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 2, 2, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 2, 2, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 1, 1, 512) 0 _________________________________________________________________ block6_flatten (Flatten) (None, 512) 0 _________________________________________________________________ block6_dropout1 (Dropout) (None, 512) 0 _________________________________________________________________ block6_fc1 (Dense) (None, 128) 65664 _________________________________________________________________ block6_dropout2 (Dropout) (None, 128) 0 _________________________________________________________________ block6_fc2 (Dense) (None, 10) 1290 ================================================================= Total params: 14,781,642 Trainable params: 14,521,482 Non-trainable params: 260,160 _________________________________________________________________ None loss accuracy val_loss val_accuracy min 0.182973 0.614126 0.245493 0.847297 max 1.180056 0.951278 0.502027 0.932398 Fully freezing the original conv base resulted in poor accuracy when retraining the model and its embeddings would result in a low accuracy at 4.2. So, we had to finetune the base so that the embeddings would be more useful for our data. We found that unfreezing more blocks would result in better embeddings, but to be sure that we would not erase too much of the original embeddings, we unfroze only the last three block of convolutions and re-trained these with a very very small learning rate.
Pipeline(memory=None,
steps=[('classifier',
KNeighborsClassifier(algorithm='auto', leaf_size=30,
metric='minkowski', metric_params=None,
n_jobs=-1, n_neighbors=15, p=2,
weights='uniform'))],
verbose=False)
Accuracy on validation set: 0.9275
Accuracy on test set: 0.9268
Evaluation: 0.9267801389868063
Running time: 195.08 seconds Last modified: April 20, 2020 scikit-learn version: 0.22.2.post1